Generating data as a proxy for unavailable corpus data: the contextualized sentence completion task
نویسنده
چکیده
There is much interest in using large corpora to explore predictors of the probability of higher level linguistic structures, but suitable corpora are not available for all languages and their varieties. We explore a task that uses discourse contexts from an existing corpus as prompts for sentence completion to investigate the usefulness of the method for generating data as a proxy for unavailable corpus data. Mini databases of dative and genitive structures were obtained with the method using American and Australian participants. It is shown that the databases are indeed a good proxy for corpus data.
منابع مشابه
برچسبزنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه
Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...
متن کاملParallel Corpus Refinement as an Outlier Detection Algorithm
Filtering noisy parallel corpora or removing mistranslations out of training sets can improve the quality of a statistical machine translation. Discriminative methods for filtering the corpora such as a maximum entropy model, need properly labeled training data, which are usually unavailable. Generating all possible sentence pairs (the Cartesian product) to generate labeled data, produces an im...
متن کاملImprovement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...
متن کاملبررسی الگوهای ذهنی طرحوارهای کمالگرایی و تأیید خواهی در افسردگی
AbstractObjectives: The purpose of this research is to investigate two different perspectives on depressive thinking. One viewpoint considers depression as a reflection of increasing general accessibility of negative constructs and depressive memories the other defines depressive thoughts as a reflection of changes at a more general level of cognitive representation. Method: 54 subjects selecte...
متن کاملبررسی روش های ارزیابی صرف زمان فعل و تعیین بهترین روش در کودکان 3 و 4 ساله شهر رشت در سال 1393
Introduction: one domain of morphology is inflection that adds syntactic considerations to the words. This domain is affected in individual with language disorders. So evaluation of inflection in these people is important. In this study, methods of verb tense inflection evaluation were compared and the best method was determined. Methods: This study was descriptive-analytical. The participa...
متن کامل